为了使模型在看不见的域(又称域的概括)下进行概括,学习是域 - 不可思议的特征表示并捕获构成对象类别的基础语义。朝着弱监督的视力语言模型的最新进展,从廉价监督的嘈杂文本注释中学习整体表示,通过捕获在不同域下概括的对象特征,表明了他们在语义理解上的能力。但是,当涉及多个源域时,数据集中每个图像的策划文本注释的成本可能会爆炸多次,具体取决于其数字。这使得该过程乏味和不可行,阻碍了我们直接使用这些监督视觉语言方法来实现对看不见的领域的最佳概括。从此激励的是,我们研究了如何以“内在”的方式利用现有预训练的多模式网络的多模式信息,以使系统在看不见的域下概括。为此,我们提出了用于域概括(Indigo)的固有多模式,这是一种简单而优雅的方式,用于利用这些预训练的多模式网络中存在的固有模态以及视觉模态以增强概括性在测试时间内看不见域。我们在几个领域的概括设置(封闭状态,OPENDG和有限的来源)上进行了实验,并在看不见的域上显示了最新的概括性能。此外,我们提供了彻底的分析,以发展对靛蓝的整体理解。
translated by 谷歌翻译
Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
We introduce an ensemble learning method based on Gaussian Process Regression (GPR) for predicting conditional expected stock returns given stock-level and macro-economic information. Our ensemble learning approach significantly reduces the computational complexity inherent in GPR inference and lends itself to general online learning tasks. We conduct an empirical analysis on a large cross-section of US stocks from 1962 to 2016. We find that our method dominates existing machine learning models statistically and economically in terms of out-of-sample $R$-squared and Sharpe ratio of prediction-sorted portfolios. Exploiting the Bayesian nature of GPR, we introduce the mean-variance optimal portfolio with respect to the predictive uncertainty distribution of the expected stock returns. It appeals to an uncertainty averse investor and significantly dominates the equal- and value-weighted prediction-sorted portfolios, which outperform the S&P 500.
translated by 谷歌翻译
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its low-latency nature. Combined with the concept of Joint Transform Correlator (JTC), the computationally expensive convolution functions can be computed instantaneously (time of flight of light) with almost no cost. This 'free' convolution computation provides the theoretical basis of the proposed PhotoFourier JTC-based CNN accelerator. PhotoFourier addresses a myriad of challenges posed by on-chip photonic computing in the Fourier domain including 1D lenses and high-cost optoelectronic conversions. The proposed PhotoFourier accelerator achieves more than 28X better energy-delay product compared to state-of-art photonic neural network accelerators.
translated by 谷歌翻译
在图像分类中,在检测分布(OOD)数据时发生了许多发展。但是,大多数OOD检测方法是在一组标准数据集上评估的,该数据集与培训数据任意不同。没有明确的定义``好的''ood数据集。此外,最先进的OOD检测方法已经在这些标准基准上取得了几乎完美的结果。在本文中,我们定义了2类OOD数据使用与分布(ID)数据的感知/视觉和语义相似性的微妙概念。我们将附近的OOD样本定义为感知上相似但语义上与ID样本的不同,并将样本转移为视觉上不同但在语义上与ID相似的点数据。然后,我们提出了一个基于GAN的框架,用于从这两个类别中生成OOD样品,给定一个ID数据集。通过有关MNIST,CIFAR-10/100和Imagenet的广泛实验,我们表明A)在常规基准上表现出色的ART OOD检测方法对我们提出的基准测试的稳健性明显较小。 N基准测试,反之亦然,因此表明甚至可能不需要单独的OOD集来可靠地评估OOD检测中的性能。
translated by 谷歌翻译
通过查找图像可能不满意的图像来捕获对象检测器的错误行为,这一兴趣很长。在实际应用(例如自动驾驶)中,对于表征除了简单的检测性能要求之外的潜在失败也至关重要。例如,与远处未遗漏的汽车检测相比,错过对靠近自我车辆的行人的侦查通常需要更仔细的检查。在测试时间预测这种潜在失败的问题在文献和基于检测不确定性的传统方法中被忽略了,因为它们对这种错误的细粒度表征不可知。在这项工作中,我们建议将查找“硬”图像作为基于查询的硬图像检索任务的问题进行重新制定,其中查询是“硬度”的特定定义,并提供了一种简单而直观的方法,可以解决此任务大型查询家庭。我们的方法完全是事后的,不需要地面真相注释,独立于检测器的选择,并且依赖于有效的蒙特卡洛估计,该估计使用简单的随机模型代替地面真相。我们通过实验表明,它可以成功地应用于各种查询中,它可以可靠地识别给定检测器的硬图像,而无需任何标记的数据。我们使用广泛使用的视网膜,更快的RCNN,Mask-RCNN和CASCADE MASK-RCNN对象检测器提供有关排名和分类任务的结果。
translated by 谷歌翻译
本文提出了一个基于混合融合的多模式情感识别系统,该系统将语音话语和相应图像描绘的情绪分类为离散类。已经开发了一种新的可解释性技术,以确定重要的语音和图像特征,从而预测特定的情感类别。拟议的系统的体系结构是通过大量消融研究确定的。它融合了语音和图像特征,然后结合了语音,图像和中间融合输出。提出的可解释性技术结合了划分和征服方法,以计算表示每个语音和图像特征的重要性的刻薄值。我们还构建了一个大规模数据集(IIT-R较小的数据集),包括语音话语,相应的图像和班级标签,即“愤怒”,“快乐”,“仇恨”和“悲伤”。拟议的系统已达到83.29%的情绪识别精度。提出的系统的增强性能提倡利用多种模式中的互补信息来识别情绪的重要性。
translated by 谷歌翻译
本文提出了一个多模式的情感识别系统,即视觉口语文本添加剂网(Vista Net),以将包含图像,语音和文本的多模式输入反映的情绪分类为离散类。还开发了一种新的可解释性技术,即K平均添加剂解释(KAAP),以确定重要的视觉,口语和文本特征,从而预测特定的情感类别。 Vista Net使用早期和晚期融合的混合体从图像,语音和文本方式融合信息。它会自动调整其中间输出的权重,同时在不干预的情况下计算加权平均值。 KAAP技术计算每种方式和相应特征在预测特定情绪类别的贡献。为了减轻带有离散情绪类别标记的多模式情感数据集的不足,我们构建了一个大规模的IIT-R MMEMOREC数据集,该数据集由现实生活中的图像,相应的语音和文本和情感标签(“愤怒,'快乐,''happy,''快乐,'' “恨,”和“悲伤”。)。 Vista Net在考虑图像,语音和文本方式上导致了95.99%的情绪识别精度,这比考虑任何一种或两种方式的输入的表现要好。
translated by 谷歌翻译
随着变压器在计算机视觉中普及的激增,一些研究试图确定它们是否可以比卷积神经网络(CNN)更适合分配变化并提供更好的不确定性估计。几乎一致的结论是它们是,并且通常或多或少地明确地认为这种所谓优势的原因是归因于自我注意力的机制。在本文中,我们进行了广泛的经验分析,表明最近最新的CNN(尤其是Convnext)可以比当前的最新变压器更强大,可靠,甚至有时甚至更多。但是,没有明显的赢家。因此,尽管它很容易陈述一个建筑家族比另一种建筑的明确优势,但他们似乎在各种任务上享有类似的非凡表演,同时也遭受了类似的脆弱性,例如纹理,背景和简单性偏见。
translated by 谷歌翻译